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Chapter 1: Panel Data Models 


1.1. Static Panel Data Models 
Panel data are repeated measures on individuals (i) over time (t). A 
longitudinal dataset obtained by following a given sample of individual 


agents (or households, firms, cities, regions, countries etc) over time. 


Examples: 
Consumption function (data on households) 
Cost function (data on firms) 


Production function (data on firms) 


Regress y, on x, for i=1,....N and t=1,...,T 


id year yr92 yr93 yr94 DUMI DUM2 Y x 
1 19921 0 0 1 0 55 70 
1 1930 1 0 1 0 50 68 
1 1940 0 1 1 0 66 80 
2 19921 0 0 0 1 77 94 
2 1930 1 0 0 1 85 100 
2 1940 0 1 0 1 90 123 


Gay OG) Gy G) GC) ©) (...) (...) (...) 


If all N individuals are observed at all time periods, then balanced panel. If 


there are missing observations, then unbalanced panel. Analyzing 
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unbalanced panel data typically raises few additional issues compared with 
the analysis of balanced data. However, if the panel is unbalanced for 
reasons that are not entirely random (e.g. because firms with relatively low 
levels of productivity have relatively high exit rates), then we may need to 
take this into account when estimating the model. This can be done by 
means of a sample selection model. We abstract from this particular 


problem here. 


Repeated cross sections are not the same as panel data. Repeated cross 
sections are obtained by sampling from the same population at different 
points in time. The identity of the individuals (or firms, households etc.) is 
not recorded, and there is no attempt to follow individuals over time. This 
is the key reason why pooled cross sections are different from panel data. 
Even with identical sample sizes, the use if a panel data set will often yield 
more efficient estimators than a series of independent/repeated cross- 
sections. 

Example 


y,=A,tate, (random effects) 


Suppose we are interested in the change of 2, from one period to another. 
Then, the variance of the estimator 1,-7, (s #t) is given by 


Var(A,-A,) =Var(A,) +Var(A,) — 2Cov(A,.A,) 


with 4,=N"(y,,+.¥y)> t=1.sT 


(A= N1(y1,+--+¥ 1) > A= N71 (yiote-t+Y no) see A= NU (\ypte +Yyr)) 
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Assuming cross-sectional independence 


1 l a 
= He (COMI dia) +--+ COMY wi ¥ na)) = ye? NG = WV 


Therefore, Cov(/,,A,)>0 in panel data but Cov(i,,4,)=0 in repeated cross 


sections. Thus, if one is interested in changes from one period to another, a 


panel will yield more efficient estimators than a series of cross-sections. 


Three specializations to general panel methods: 
1. Short panels (Micro Panels): assumed, with T small and N >. Data on 


many individual units and few time periods. 


2. Long panels (Macro Panels): assumed, with To and N small or 
N-—o. Time series data on many individual units. More common with 


ageregate data. 
3. Dynamic models: regressors include lagged dependent variables. 
Examples of Micro Panel data 


- Panel Study of Income Dynamics (PSID) 
(https://psidonline.isr.umich.edu) 


- The European Community Household Panel (ECHP) 


(http://ec.europa.eu/eurostat/web/microdata/european-community- 
household-panel) 


Examples of Macro Panel data 


- Federal Reserve Bank of St. Louis 


(https://fred.stlouisfed.org/) 
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- Yahoo Finance 


(http://finance.yahoo.com) 


- Penn World Table (PWT). Provides purchasing power parity and national 
income accounts converted to international prices for 188 countries over 


the last six decades. (httt://pwt.econ.upenn.edu) 


- World Bank, World Development Indicators (WDI). Provides more than 
900 indicators for 152 economies. (www.worldbank.org/data) 


- International Monetary Fund (IMF), World Economic Outlook Databases 
& International Financial Statistics (IFS) provide more than 32000 time 


series covering more than 200 countries. (www.imf.org) 


- Organization for Economic Co-operation and Development (OECD) 


(www.oecd.org) 


- European Central Bank (ECB) 
(http://www.ecb.int) 


Consider the following panel data model 


Vic=A +P tE i, (1) 
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Note that i=1,...,N denotes the individual, firm, country and so on, and 
t=1,...,T 1s the time period. The term a@, denotes unobservable individual 
specific effects and ¢, denotes the remainder disturbance assumed to be 


independently and identically distributed (IID). 


Advantages of panel data 

1. More data compared to time series or cross-sections, more 
variability/more informative data as variables vary over two dimensions, 
less collinearity among regressors, and more efficiency. Time series data 
suffer from multicollinearity. This is less likely in panel data since the 
cross-section dimension adds a lot of variability. In fact, the variation in the 
data can be decomposed into variation between cross sections and variation 


within cross sections. The former variation is usually bigger. 


2. Reduces the data needs. The richness of panel data obviates the need for 
data on things that may be difficult or impossible to measure (unobserved 


heterogeneity). 


Example: Wage regression 


wage,,= a + feduc,,+yabil,+é,, 


where abil, denotes innate ability (constant through time), which cannot be 
observed. Thus, run OLS 

wage,,= a+ feduc,+w;, where w,=yabil;+é,, 
If innate ability is not correlated with education, then jabil, is just another 


unobserved factor making up the residual. It is true that OLS will not be a 


Best Linear Unbiased Estimator (BLUE), because the error term 


ll 
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w,, = yabil,+e, 18 serially correlated (see below). Notice that OLS would be 


consistent, however, and the only substantive problem with relying on OLS 
for this model is that the standard formula for calculating the standard 
errors 1s wrong. 

However, the problem is that innate ability might be correlated with 
education, in which case 


E(w,,/ educ,,) # 0 => Cov(educ;,,,w;,) #0 


OLS will be inconsistent (unbiased regardless of the sample size). In 


particular, it can be shown 


Cov(educ,,, abil;) 


lim BO = B+ 
aad P Var(educ,,) 


which shows that the OLS. estimator is inconsistent unless 


Cov(educ,,,abil,) = 0. If Cov(educ 


it? 


,,abil,) > 0 (positive correlation), then there is 
an upward bias. If the correlation is negative, we get a negative bias. 

However, panel data can solve this problem by applying particular 
transformations to the data, which is not possible using cross-sectional 
data. For instance, write the model at time t-1 


wage,,_,= a + feduc,,_,t+(yabil,+é;,_,) 


wage,,= a+ feduc,,t+(yabil;+é,,) 


Subtracting the first from the second equation yields 
(wage;,—wage;,_,) = B(educ,,—educ;,_,) + (€;-€ x1) 
Awage,,= PAeduc,,+Aé ;, 


Innate ability has been eliminated because it does not vary through time. 
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Properties of A¢,, 

1. E(Ae,,) =0 

2. Var(Aé,,) =Var(é,+(-€,_,)) = Var(é;,) + Var(—é,,_,) = Var(é,,) + (-1)’Var(é,,_,) = 207 

3. CowAé j»A€,_1) = E(AE AE g_) = EE p-E n(n s-E no) = ~E (E41) = -Oo 
COV(AE jn AE ng») = E(AE AE 2) = E(€ iE ip DE eo -E pg) = 9 


Cov(Aé,,, Aé 


I 


J=Oy. se 2 


it—s. 


(First-order serial correlation!) 


OLS will be consistent, though inefficient due to autocorrelation. This is 


the so-called first-differenced (FD) estimator. 


3. Controls for parameter heterogeneity (related to the previous issue). 
Consider the following model: 


wage,,=a + Peduc,,+é ;, 


where the intercept term is specific to each individual (heterogeneous). 
What happens if we ignore this heterogeneity and mistakenly assume that 
the intercept is the same across individuals. 

wage,,= (u— ) +a,+feduc,,+é,, 


wage,,= + Beduc,+w,, Where w,=a-“t+é; 


If the individual-specific intercepts are correlated with education, we will 
have 
E(w,,/educ,,) # 0 => Cov(educ;,,,w;,) #0 


Thus, OLS will be inconsistent. 


-- See figures below -- 
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Notice how closely related are the problems of omitted variables 
(individual-specific intercepts, which are time invariant) and unobserved 


heterogeneity (time invariant). You can always argue/set a— = abil, . 
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1.2 The Fixed Effects ("Within') Estimator 

One way to estimate the model is to assume that each a, is a fixed/constant 
parameter to be estimated (just like 4). The a, thus capture the effects of 
those variables that are peculiar to the i-th individual and that are constant 
over time. This is called the fixed effects (FE) estimator. We may either 
allow in the model for individual-specific dummies, 


Yi,=A + Px, +€;, ’ (€;, is IID) 


Y= (D2 Petey (2) 


We thus have a set of N dummies in the model. The parameters q@.,,...,a, 
and £ can be estimated by OLS. It is straightforward to see how to test for 
whether the panel approach is really necessary at all. In other words, to test 
whether all of the intercept dummy variables have the same parameter, 
H,.a@=a,=..a, (N-1 restrictions) 

If this null hypothesis is not rejected, the data can simply be pooled 
together and standard OLS employed. If this null is rejected, however, then 
it is not valid to impose the restriction that the intercepts are the same over 


the cross-sectional units and a panel approach must be employed. 


When N is large it may be numerically unattractive to have a regression 
with so many parameters to estimate. Fortunately, one can compute the 
estimator in a simpler way. It can be shown that exactly the same estimator 
for £ is obtained if the regression is performed in deviations from 
individual means. Essentially, this implies that we eliminate the individual 


effects a, first by transforming the data. To see this, note 


Y= t+ PX té; 
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where y,=T"')> y, and similarly for the other variable. Consequently we 


can write 


(Yi-Yi) = (a-@;) + B(Xi,-X)) ar (€,-€)) 


(Vi-Yi) = BCX -X)) + (Ei) (3) 


This regression involves demeaned variables and therefore does not include 
the individual effects a,. So, transform the data in terms of deviations from 
individual-specific averages (Within Groups transformation is called 
because the subtraction is made within each cross-sectional unit) and 


remove the individual-specific (intercepts), 


Both (2) and (3) can be estimated by OLS. The estimator is called fixed 


effects (FE), least squares dummy variables (LSDV) or within estimator. 


The fixed effects estimator focuses on differences 'within' individuals. Put 


differently, it explains to what extent y, differs from y, and does not 
explain why y, is different from y,. Note the assumptions about £ impose 
that a change in x has the same (ceteris paribus) effect, whether it is a 


change from one period to the other or a change from one individual to the 


other. 


The OLS estimator for 2 


= Dols (Xi,-X (Ya -Yi) 
Dan Ds (x;,-X,)° 
ie a be Y % XXX) Dy DXi -X)Oe-¥i) 


(if B™ was a vector) 


BF 


Assumption 1: unobserved terms a, can be freely correlated with x,,. 
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Assumption 2: E(x,¢,)=0 for s = 1, 2, ... T (strict exogeneity). Clearly, 


we cannot include y,_, as a regressor. 


Properties of (¢,-2,) 


E,jt.. +€ 


1. E(é,-&) = E(é,,) — E(é) = E(é,,) — E( T)=0 


2. Var(e,-&,) =Var(é,) +Var(-&,) =Var(é,) +Var(- Sede ait 


1 1 
=Var(é,,) + Var(- ace +&,7)) = Var(é;,) + Fa aes +& 7) 


2 
Le +e; 
je 


1 
= 2 De 2 
Sg rte =O,+ 


3. Cov(é,-€ »€,_-€;) = 9, since e, 1s IID across individuals and time. 
Therefore, The FE estimator is unbiased and efficient. 


We now see why this estimator requires strict exogeneity: the error term 


= at Pe ; . 
€.-€ =e,-—!! - T contains all residuals whereas the transformed 


explanatory variable(s) contains all values of the explanatory variable(s) 


Xt... +X; 
XX; =X = T . Hence, we need E(x,,é,,)=0 for s = 1, 2, ... T; or there 


will be endogeneity bias if we estimate by OLS. 


In the within estimator, the individual-specific intercepts can be estimated 


as, 


Note that as To, the FE estimator of both a,(i=1,..,N) and # is 


consistent. However, if T is fixed and N — o as is typical micro panels, 
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then only the FE estimator of # is consistent. The FE estimator of a, 1s 


inconsistent because the number of individual-specific intercepts increases 


to infinity as N>o. 


The covariance matrix for 2" (vector) 


Var 2") = 02(Sy Y (%-X (XH XY 


with 


=F Gee) 


CG 
: Te 1) 


G, = Nae Nop ee ir ¥-(x,-X,)' By 


It is possible to apply the usual degrees of freedom correction in which case 
the number of explanatory variables is subtracted from the denominator. 
How many degrees of freedom? NT-N-k where k is the number of 
explanatory variables. Note the least squares dummy variables (LSDV) 
method estimates N+k parameters, or put differently, the within estimator 
uses a further N degrees of freedom in constructing the demeaned variables 


(we constructed N individual means). 


Under weak regularity conditions, the fixed effects estimator is 


asymptotically normal, so standard inference can be applied. 


The within estimator regression will give identical parameters and standard 
errors as would have been obtained directly from the LSDV regression, but 
without the hassle of estimating so many parameters. The disadvantage of 
within estimator regression, however, is that we lose the ability to 
determine the influences of all of the variables that affect the dependent 


variable but do not vary over time. For example, consider 
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pol,,=aj+PGDP ,,+neq;+é ;, 5 
Averaging over time 
pol,=a+ BGDP,,+yineq,+é, 
Consequently we can write 
(pol,,—pol,) =(a@-a@,) + B(GDP,,-GDP,) + y(Ineq,—Ineq,) + (€,-€)) 


(pol,,—pol,;) = 8(GDP,,-GDP,) + (é,-é,) 


1.3 The Between Estimator 
An alternative to the within estimator (fixed effects) would be to simply 
run a cross-sectional regression on the time-averaged data, which is know 
as between estimator, 

Y=a+Px,t+é,, i=1,...,N 
An advantage of the between estimator over the within estimator is that 
this averaging often reduces the effect of measurement error in the 


variables on the estimation process 
1.4 The First-Differenced (FD) Estimator 
Another way to estimate the model is to use the first-differenced estimator 


AY i= BAX;,+ AE j, 


Clearly this removes the individual fixed effect, and so we can obtain 
consistent estimates of 4 by estimating the equation in first differences by 


OLS. 


Assumption 1: unobserved terms a, can be freely correlated with x,,. 
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Assumption 2: E(x,,¢,)=0 for s = t, t-1. This is a weaker form of strict 
exogeneity than what is required for fixed-effects (FE), in the sense that 
E(X,,€,-.) =93 for example, is not required). Thus, if there is feedback from 
é, to x, that takes more than two periods, FD will be consistent whereas 


FE will not (hence weaker form of strict exogeneity). 


You now see why this estimator requires exogeneity: the error term 


contains ¢, and ¢,,, whereas the vector of transformed explanatory 
variable(s) contains x, and x, _, : Hence, we need E(x,¢,)=0 for s = t, t-1; 
or there will be endogeneity bias if we estimate by OLS. 

Important: FE versus FD. 

So, FE and FD are two alternative ways of removing the fixed effect. 


Which method should we use? In general, FD is consistent but inefficient 


(due to autocorrelation). 


(i) The FD and FE estimators are the same if T=2 (1.e. we have only two 


time periods). 


Proof 
FD: (ViVi) = B(Xi-Xit) + (€;-E¢») 


Note that there is just one cross-section! 


T22) GeV) =f OR) teen 


We cannot have autocorrelation. Thus, OLS is consistent and efficient. 


FE: (Vi-Yi) = BOX) + (€,-€)) 


22 


é 


TH1 (yp 222) = B(x, FB) + (Ey EB) 


Xi 8) Ei2 i) 


eae mB) = yeas 


XX; 


i2 été; 
5 L y+(é Ein 1 1 


1?) (Yo 8) = BX, 2) 


— 


(PAB) = BOP) + A) 


So, one of the 2 cross-sections is redundant. 
(11) However, for T>2, the FD and FE estimators are NOT the same. 


Under "classical assumptions", i.e. ¢,~ IID(0,o2), the FE estimator will be 
more efficient than the FD estimator (as in this case the FD residual ¢, will 


exhibit negative serial correlation, E(Aé,A¢é;,_,) =—0; ). 


Under the null hypothesis that the model is correctly specified, FE and FD 
will differ only because of sampling error. Hence, if FE and FD are 
significantly different - so that the differences in the estimates cannot be 
attributed to sampling error - we should worry about the validity of the 


strict exogeneity assumption. 


Note that strict exogeneity rules out feedback from past ¢, shocks to 
current x,. One implication of this is that FE and FD will not yield 


consistent estimates if the model contains lagged dependent variables 
(dynamics models). In this case, we may be able to use instruments to get 


consistent estimates. 
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1.5 An extension of the Fixed Effects Estimator 
Consider the a fixed effects model with a two-way error component 


Vu=AtA,t+ PX tE, (€, 18 IID) 


N i T ime 
Yie= ODF 7 1 ha OT )+ PX t€ it 


Note that 2, denotes is individual-invariant and accounts for any time- 
specific effect that is not included in the regression. For example, it could 
account for strike year effects that disrupt production, oil price effects, 


macroeconomics and financial crisis effects, etc. 


However, the number of parameters to be estimated now would be k+N+T, 
and the within transformation in this two-way model would be more 


complex. 


1.6 The Pooled OLS Estimator 
Consider 


Vic= PX tH(até,) 


where I have put (a,t¢,,) within parentheses to emphasize that these terms 


are unobserved and are will not be estimated separately. 


Assumption 1: unobserved terms qa, are uncorrelated with x,,. 


Assumption 2: E(x,,¢,,)=0 (contemporaneously uncorrelated). This is an 


t 


even weaker form of strict exogeneity than what is required for FD and FE 


1 


estimators in the sense that E(x,,é,_,)=0; for example, is not required). 


Clearly under these assumptions, w?’* =a,+e, will be uncorrelated with x, , 


24 


implying we can estimate # consistently using OLS. In this context we 


refer to this as the Pooled OLS (POLS) estimator. 
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1.7 Random effects 
Another way to estimate the model is to assume that each a; is a 
random draw from a common distribution with a finite mean and finite 
variance (i.e. random factors IID distributed over individuals). Re-write, 
Vit= A t+ PR t € it 
Yit= OU j+ PX, tU + i 
Vic= (@-U;) + BX +; +é i) 


Vit= A+ PXig + Wit 


where a=a+u, (thus, a =a,-u,;) and w,,=u;+é; 


u;~ ID(0,o;), é,~ ID0,o2), E(u;,é;,) =0 


and u, measures the random deviation of each individual’s intercept term 


from the ‘global’ intercept term a. 

Assumption 1: unobserved terms u, are uncorrelated with x,,. 

Assumption 2: E(x,¢,) =0 for s = 1, 2, ... T (strict exogeneity). 

Note that this combines the strongest assumption underlying FE estimation 
(strict exogeneity) with the strongest assumption underlying POLS 


estimation (no correlation between unobserved effects and the explanatory 


variables). 
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There are no dummy variables to capture heterogeneity in the cross-sectional 


dimension. Instead, this occurs via the u, terms. 


Note: Under the above assumptions: 


1) POLS will be consistent but inefficient because of omitted random effects 
problem u, or because the composite error term (u;+é,,) is autocorrelated. 
Explores both the within and between dimension of the data. 

2) FE will be consistent but inefficient due the fact that it explores only the 
within dimension of the data. 


3) FD will be consistent but inefficient due to autocorrelation. 


Properties of composite error w,,=u;+<¢;, 

1. E(w,,) = E(u;) + E(é;,) =0 Vi, Vt 

2. Var(w;,) =Var(u,+é;,) = Var(u,) +Var(é;,) =o, +o. = 0% Vi, Vt 

3. Cov(Wip Wie1) = E(w Waa) = Ete Ut Ena) = Eup tye pt Eli € eE ind 
= E(u; )+ E(uj€q-) + Ele) + EEE a) 
=0°+0+0+0=07 Vi, Vt 
Cov(Wi Wis) = E(Wi Wis) = E(u; +E j(Uj +E in.) =O 5 s21 Vi,Vt 


it—s it—s 


(Higher-order serial correlation!) 


That is, the correlation of the error terms over time is attributed to the 


individual effects u.. 
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Also note that if o2 is high relative to o? the serial correlation in the error 


terms will be high. As a result the conventional estimator of the covariance 


matrix for the OLS estimator will not be correct. 


Thus, the composite error is serially correlated, which implies that the 
optimal (most efficient) estimator should be a Generalized Least Squares 
(GLS) estimator. This is the so-called random effects (RE) estimator for 


panel data. 


Derivation of the GLS-random effects estimator 


(based on Hsiao C. (1986), Analysis of panel data, Cambridge University Press) 


For individual i all errors can be stacked as 
U;l,pt+é; 
where 2,=(1L...,1)’ of dimension T and ¢=(E,,....€;7)' 


2 ' 2 
Var(u,1,+é,) = Q=071,1,+0,1, 


For each individual i we transform the data by premultiplying y=(y,,....V,7)' 
by 


Thus, the GLS estimator is given by 


ute 


pos = [5.5 (6, XR)! + TL (X,-X)(x, -7) 


(ELV +a2 rte ar aT I —X\(Y; ») 
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xX, 
where x = Zug is the overall sample average. 


2 


When T > the term —°" 0 
o;, +To; 


pe > Wawa Xa -X Mie“) - Be 


It can also be derived 


Be = Ap? +(1-A)B™ 


where #3? =(¥, (%-28,-¥))'Y, (%-DGH-9) is the between estimator for 
f£. It is the OLS estimator in the model for the individual means 


Y=at PX+(u,té), i=1,...,N 


where A is the weighting matrix that is proportional to the inverse of the 
covariance matrix of #%. Thus, the GLS estimator is a matrix-weighted 
average of the between estimator and the within (fixed-effects) estimator, 


where the weight depends upon the relative variances of the two estimators. 


The between estimator ignores any information within individuals. The GLS 
estimator, under Assumptions 1-2, is the optimal combination of the within 
and between estimators, and is therefore more efficient than either of these 


two estimators. 
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The RE estimator involves (as any other GLS estimator) running OLS on a 
“suitably transformed” model. The term “suitably transformed” means that 
the transformed model has serial uncorrelated errors. Therefore, OLS is the 
best linear unbiased estimator (BLUE) in this case. Averaging over time, in 
terms of unit means, 

Y=at Bx,+w; 


2 
oO; 


Multiply by 6, where 06 =1- 


2 2 
o, +To, 


Y= Oat + OBX + Ow 


Subtract this equation from the initial one. The transformed model is given 
by 
(Yie-P i) = a — A) + B(Xi--&;) + (Wi, -OW;) 


It can be shown 


Cov(w;, —OW ; Wit - OW; ) = Ew —0)+8;,-0E; ua —0) +E 4-0) =0 


2 
Note that (i) if T >, then ae —+ 0 and @-1 and the RE (GLS) 
o: +To 


u 


estimator tends to the fixed effects (FE) estimator (micro panel versus macro 


panel). 


The above equation is very interesting because it involves quasi-demeaned 
data on each variable. In other words, rather than subtracting the entire 
individual mean (which is what the fixed effects does), the transformation 


subtracts only some fraction of the mean, as defined by @. Notice that this 
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implies that unobserved heterogeneity (as reflected by the individual- 
specific time-invariant effects) is not fully eliminated because 


(w;,-@w;) =u (1-0) + (€;,-G€ ;) 


As usual, GLS is unfeasible because we do not know the parameter @. So, 0 
has to be estimated first. This involves estimating o? and o?. One way to 
do that, the simplest perhaps, is to use POLS in the first stage to obtain 
estimates of the composite residual w, and its variance G° . Based on this, 
we can calculate o? as the covariance between w, and w,_, (for instance), 


and then calculate 


6; = 6-6, (oy =0; +0; ) 
We can then plug 62,6? into the formula for 6 
a2 
a (oy 
§=1- z 
O74 Ta: 


Then, estimate the transformed equation. 


(Vi) =a(1- 6) + B(Xip-®;) as (w;,-OW;) 


This is the Feasible Generalized Least Squares (FGLS) estimator. 


Also, another consistent estimator of o? is obtained from the within 


residuals 
oo NTA aoe! Mire aan 
G, = NG le PHAR e). 


31 


Nektarios Aslanidis 


Under weak regularity conditions, the random effects estimator is 
asymptotically normal with covariance matrix given by 


2 
el 


2 2 
o, +Ia, 


Var( Bs) = 02 EyE.c-RN-8) + TEAC) 


2 
As long as ee >0, the random effects estimator is more efficient than 
o; +To, 


u 


the fixed effects estimator (T° (X,-X)(X,-X)' is positive definite). The gain in 
efficiency is due to using the between variation in the data (x,-x). The 


covariance matrix is routinely estimated by the OLS expressions in the 


transformed model given above. 


1.8 Fixed Effects or Random Effects 

- Testing for non-zero correlation between the unobserved (individual) effect 
and the regressor(s): FE versus RE. The RE estimator requires that the 
individual effect must be uncorrelated with the regressors for it to be 
consistent. If this assumption is not tenable, the FE estimator should be used. 
In the present context, the FE estimator is consistent regardless of whether 


a, is or is not correlated with x, , while the RE requires this correlation to be 


it > 


zero in order to be consistent. Strict exogeneity is assumed for both models. 


The Hausman statistic is computed as 


H =(f™ - B™)[Var(B™)-Var(B™)1' (8 - B™) 


using matrix notation. Note that because the random effects estimator is 


efficient under the null 
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Var (3"* — B®") =Var(B") -Var(B**) 


Under the null hypothesis, 


plim(B™ — B) 30 


this test statistic follows a chi-squared distribution with M degrees of 
freedom, where M is the number of time explanatory variables in the model. 
In the case of a single slope parameter, the Hausman statistic is given by 


(oe = Boy 


Var(B'") —Var( B®) vee 


Failing to reject the null hypothesis implies that the individual effects are 
uncorrelated with the explanatory variable(s). Thus, we may decide to use 
the RE model in the analysis on the grounds that this model is efficient. The 
null hypothesis is that both models are consistent, and a Statistically 
significant difference is therefore interpreted as evidence against the RE 


model. 


Also, in practice when computing the covariance matrix 


Var (3"* — B®") =Var(B") —Var(B**) 


may not be positive definite in finite samples, such that the inverse cannot be 


computed. 
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- Is the key explanatory variable constant over time? In this case, the FE 
estimator may not so appropriate because the within transformation will 
eliminate this variable. 

w_im,,=a,+fed ,+yed _h,+€;, 

w _im,=a,+ Bed ,+yed _h,+é, 


(w_im,—W_im;) = B(ed,,-@d ;) + (€,-€) 


On the other hand, the RE estimator can control as many time-constant 


variables as possible. 


- It is often argued that the RE model is more appropriate when the cross 
sections in the sample can be thought of a having been randomly selected 
from one population, but a FE model is more plausible when the cross 
sections effectively are the whole population (e.g., stocks traded on a 


particular exchange). 


- Since there are fewer parameters to be estimated with the RE model (no 
dummy or within transformation to perform) and thus degrees of freedom 


are saved, the RE has an advantage. 


- Are inferences made conditional on the effects that are in the sample or 


unconditional? 
The FE estimator implies that inferences are made ‘conditional upon the 


effects of the model’. This means that we can only speak about those 


individuals included in the sample. That is, it essentially considers the 
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distribution of y, given a,, the the fixed effects can be estimated. This 
makes sense intuitively if the individuals are 'one of a kind' and cannot be 
viewed as a random draw from the same underlying distribution (e.g., 
countries, large companies, etc). Inferences are with respect to the effects 


that are in the sample. 


On the other hand, the RE estimator implies that inferences are made 
‘unconditionally’. Basically, this is because in this model there is an implicit 
assumption that all individual effects come from a common distribution. 
Thus, the nature of the effect of any individual not included in the sample 
can be predicted. In fact, this question is related to the size of N. If N is 


small, the FE may be preferred, otherwise, the RE model is more sensible. 


Thus, the random effects method allows one to make inference with respect 
to the population characteristics. One way to formalized this is the random 
effects model says 


E(¥ilXie) a LX, 


while for the fixed effects 


E(¥ ilX iO) =A + BX: 


The parameter # in the two conditional expectations is the same only if 
E(a|x,,)=0. So, the reason why one may prefer fixed effects is that some 


interest lies in the alphas, which is the case if the number of individuals is 


relatively small and of a specific nature. 
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1.9 Mean Group (MG) estimator 
Consider the following model 


Vp=At PX, +E i 
Assumption: Parameter heterogeneity can be freely correlated with x, . 


Suppose we are interested in the average effect across individuals (the mean 


impact of a, and £; on y,,). The Mean Group (MG) estimator estimates the 


individual-specific time series by standard OLS and then averages these 


coefficients over individuals. 


pe = 


The MG estimator is consistent and asymptotically normal for N >. 


The variance of the MG estimator is given by 


Vari B") =O (Bes — Bey 


Standard inference applies. 


The advantage of MG estimator is that we do not calculate the variances of 
the estimates for each individual (in this case, we would need to account for 
cross-sectional dependence, if there is any). Instead, we compute the 
variance of the estimates over individuals. A further advantage of MG is that 


we can accommodate unbalanced panels. 
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1.10 Dynamics Panel Data Models 


An autoregressive panel data model, AR(1) 


Vit= Wirt +E; » lv\<1 


The fixed effects estimator for y 


Nw _ im 
pF Dad nt eV Vea) I NT 
N wor = 
ae De (Vei-Via) /NT 


where ‘ce eee and Vga kos Yi. + Substitute AR(1) into the 
estimator yields 


N T = - 
N T = 
Das Oey ay /NT 


~FE _ 
a A 


It can be shown 


1 orur = = o. (T-1)-Ty+y" 
lim ee €.-€.)\(y.,,-Y, =—— 
Pp _(z pan ae it MV es 7...) T? (1 = vy 


#0 


For fixed T and No , the fixed effects estimator is biased and 


inconsistent! 


Example 

» SIGMA=1; 
» T=5; 

» G=0.2; 


37 


Nektarios Aslanidis 


» BIAS=-((T-1)-T*G+G“T)/((T%2)*(1-G)*2); 
» BIAS; 
-0.18752000 


More persistent process 
» G=0.8; 
» BIAS=-((T-1)-T*G+GT)/((T’2)*(1-G)*2); 
» BIAS; 
-0.32768000 


Larger T 
» T=100; 
» BIAS=-((T-1)-T*G+G‘T)/((T%2)*(1-G)2); 
» BIAS; 
-0.047500000 


Note inconsistency is not caused by anything we assumed about the alphas. 


The problem is that Cov((¢,-2,),(¥_;-¥;)) #0. 


o. (T-1)-Ty+y" 
x (l-y) 


However, if Too, then - >0! So, fixed effects is 


consistent when N,T >. 


Take first difference and calculate 


Vici a= YVia-Yie-2) + EE) 
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OLS is not consistent since Cov(y,._,,€;_)#90 even when Too. 


This 


transformed model suggests IV estimation. Given ¢,~ IID(0,co7) (no 


autocorrelation), for example, use the instrument y,_, as 
Cov((V¥n1—Yiro)-Vn2) #0 (relevant) 


Cov((é iE ico 2) =0 (exogenous) 


Thus, the IV estimator 


ge > Dis Ves VeVi) 
> ye (Yaa Yuss) 


A necessary condition for consistency 


1 N T 
lim| ————— nage Vt | SO 
B (aa _ 1) ye uk it We] 


for either N>o or N,T> 0. 


An alternative estimator uses the instrument (y,_,-y,,_;) as 
COV((Y is Vira) Vino -Yn-a)) #0 (relevant) 


Cov((é,-€ i.) n2-Yie-s)) =0 (exogenous) 


N wor 

Sis ae Dees (Vir-2-Y ia) eV ie) 

Coe eT ee 
SS. Pe (Via Vins) V it~ Y 2) 


which is consistent if 


. 1 
p lin WF 2) ae ae (2-25 J Veo Ves) =0 
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. 1 N wor 
D in a DD (6-8 nar) =0 


Which estimator do we use? 


Use both adopting a GMM. 


; 1 
Note p inl a ae ys (6-8) | = E((E,-€n DY v2 ) =0 


: 1 N wor 
D im gy a. Pee (€:-€ it DOV t2-Y ies | = E((e iE it-DS 2 Ys) =) 


are moment conditions. Both IV estimators impose one moment condition in 
estimation. Generally, imposing moment conditions increases the efficiency 


of the estimation. 


Arellano and Bond (1991, Review of Economic Studies). 
Example T =4 

In period 2 E((é,-€,) V9) =0 

In period 3 E((é,-é,,)y,)=0, E((E;-€,3) Vio) = 0 


In period 4 E((é,-€;) Yin) =0, E((Eu-é) Yi) =0, E((Eiu-€)¥i0) =0 


GMM estimator 
Define the vector of transformed error terms 
Ein Ey 


Aé=| ... 


I 


Eire it 
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and the matrix of instruments 


(Vio) 0... ee) 
Z= 9 (Vion) +70 


0---0---0 C Vieja Viros) 
each row contains the instruments that are valid for a given period. Thus, we 
write compactly 
E(Z'Ae,)=0 


E(Z'(Ay,—-rAy,_)) =0 


It can be shown that the GMM estimator is consistent and asymptotically 


normal. 


An autoregressive panel data model with exogenous variables 


Vie= PR tW wit @t€ i 


Use GMM. Take first difference and calculate 
VeVi e BO a) FY eV) Fee) 
AY j= BAX y+ YAY yt AE 


If x,, 18 strictly exogenous, we have 


E(Ax,,Aé,) =0, Vt 
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and the matrix of instruments 


(Yio AX jz) O iss nee] 


i= 0 (Win¥invAXi3) +1°0 


0:-:0+°:0 (Vjos--sVir_o2AXir) 
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Chapter 2: VAR Models 

Since Sims (1980) critique of traditional macroeconometric 
modeling, vector autoregressive (VAR) models are widely 
used in macroeconomics. In the traditional approach the 
typical question asked is ‘What is the optimal response by 
the monetary authority to movements in macroeconomic 
variables to achieve given targets?’ Sims argued that a VAR 
model is an unrestricted model that treats all variables as 
endogenous “without restrictions based on supposed a 


priori knowledge” derived from theory. 
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2.1 Bivariate Structural Model 
Let y,, z, endogenous in bivariate first order structural 
VAR(1) 

y,=b,,-b,,.2,47,Y..+7,Z,.t€, CO) 

Z,=D DY AV Vat YZ 4té€, (2) 


assumptions (a) y,, Z, stationary processes (b) ¢,,, € 


yt ? zt 
white noise processes ¢ ~WN(0,0°), € ~WN(0,o°) 

yt y at Z 
(c) €,, and €,, are uncorrelated. 


There are feedback effects between y, and z, 


Time lag effects 


vy, — time lag effect of z_, on y, 
y,, — time lag effect of y_, on z, 


Contemporaneous effects 


—b,, — contemporaneous effect of z, on y, 


—b,, — contemporaneous effect of y, on z, 


Derive reduced-form VAR(1) 
(1), (2) => 
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y, — D.. a 0 b,, yi a a Vv 

Z, 7 Di b,, 0 Z, en Y » 

| b,, ys b,, Yi a Vig é 
— = + + 

oe Z, Ds, on Y » Ze é 


or 
Bx =I/,+1,x,,+€, 


where 


1 b,, y b 
B= oa. l= - l= 
bi 1} ° [z, b,, 


Premultiply by B" 


X =A,+A,x,,+e, 


where A,=B'T,, A=BT,, e,=B'é 


¥,=a,,+a,Y,,+a Z re, 


12°~ t-1 
Z.=d,,1da,yY,,+a Z +e, 


22°~ t-l 


but 
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| 


hoe 
Zi 


I 
| 


é 


é 


yt 


zt 
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B ] | =D i, ] LiaDyy 
~ det(B)| -b, 1 | 1-b,,b,,) -b, 1 
transformed errors (reduced-form errors) 


1 =D; é 
e=B'e= 
=D) —b), I é 


7 1 Ey by Eq 
1—b,7b, be, + Ex 


SO 


&€ —b_é €—b.é 
_ yt _ zt 


12 ~ zt 217 yt 


" 1-b.b.’ * 1-b.b 


12 ~ 21 12 ~ 21 


c 


Properties of reduced-form errors 


Mean 


E(e,)= 


E(é —b.¢_)=0, since E(eé_)= E(e_) =0 
pp EE Pee) (€,)=E(e,) 


12 ~ 21 


Variance 
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ef —b,,€, :] 
Var(e,,) a E(e,,-E(e,,)) = le oa al 


—b,,b,, 
1 
= —2b é +(b 
(1 —b,,b ae E(é,, 12 EE zt ( 12 Ec), ) 
1 
= E(e.)—2b,,E(é,¢,)+b,E(e, 
Tb bf EE) ~ br ECE, £2) +BECE) 
2 ao, +b.o; 
/ (1-b,,b,,) 


since Cov(é,,,€ ,,) = E(é,—-E(é,,) (é,-E(é,,)) 


=E(é,é 0 


yt E,) = 


Autocovariance 
Cov(e,, 98 i ) — E(e,, —-E(e,, Me. EXC «3 )) 


t ~1,t-i 
| (1—b,,b,,)° 
2 ol 
(I —b,,b a 


=(0 for i#0 


aE (Cue see ue age se 0 ees) 


12 ne zt- zt yt-i 
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since 
E(E yy = —b,,E(é ,é zt— j= —b ,E(E ,é yt— = bi E(é,€ zt— j= 0 
o, +b,0; o, +bio° 
e,~ WN (0,4), e,,~ WN(0,- 2) 
(l =i5D5i)- ( =DisDn) 


Cross-correlation (covariance) 


Cov(e,,,€,,) = E(e,,-E(e,,))(e,,-E(e,,)) 


7 C= Die WE, —b,,é 2) 
(1—b,,b,,)" 


1 
= (1 _b_b y? E(é,é,7- b,, ae —b,, Ex +b,,b,,€ 
12 21 


zt Ey) 


_ 2046, DG, 
— (-b,,b,,’ 


The errors in the reduced-form equations are correlated! 


Only when b,,=b,,=0, there is no correlation. 
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Variance-covariance matrix of the errors 


ar ( )= E(e;,) E(e,,e5, 
Cie ae 2 
€, E(e,,e,) E(es,) 


t 


L=E(e,e',)= ef 


hee Cov(e,,.€>, | Co, Oi 


Cov(e,,,€,,) Var(e,,) Or 0; 


2.2 Multivariate Structural Model 


; 
Consider K-dimensional time series vector X,= (V,,5--5V x) 


generated by reduced-form VAR(1) 
Vi =4, TOY, Fa,Y,, te. FO ate, 


Y =O FV i Ty at Ts ey pare 


Ve Oger Ces Vig ey jg Pes TO ee V eg PO 
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Var di G5 +-- x hare, e., 
=> Dy 93 = a, ae 4,44, ++ 5x J ag Sie es 
Y xt G x4 FF ye A cx Y et © i 


X =A,+A,X,,+e, 


y i a, d,,,...d,, e€., 
y a a,d,,...d e 
where x= 2t A= 20 A= 21° 22 2K ,e= 2t 
Y x: A Ko Diggs nx eC x 
e ~WN(0,2) 


Efe, ]=0, Efe,e’.]=X, and E[e e’.]=0 for s#t where 
variance-covariance matrix & is time-invariant, symmetric, 


non-singular 
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X= E(e.e’,)=E i [ew eee ex 


CK 
2 
E(e,)  .. E(e,,ex, On. asa: 1Ous 
= E(e,€, oe E(€,,€ x) = On ies O °K 
2 
Eee) + Elen) On om 


More generally, the structural VAR(p) 


BX =D 4) Xte.tlT Xe 
t t p t-p t 


where B is the (K x K) matrix of contemporaneous 


(structural) effects,’ (j=L...,p) are (K x K) matrices of 


(structural) lagged coefficients, and ¢, is the vector of 


structural errors. 


Reduced-form VAR(p) 


X =A,+A,X,,+A,X,,+.. +A,X,_,+€, 
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where A,= BT, (j=l.....p), (K x K) coefficient matrices 


_ p- 
e=Beé, 


Ey 


é 
’ Y2t 
L = E(é,é J=E lew, Cae ot e,_ | 


© Ki 
E(é,,,) a E(Ey Ey x) ome = 0 
= Eley 2 yi.) = Ele 8 ye) = OF ais 0 
2 
BE ey) a E(E\«) 0... O¥K 


X= E(é,é',) > diagonal 


py = E(e,e’,) = E(B'e(B'é,)’) 
= E(B'e,6'(B"Y) 
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= B'E(é,e',)(B™)' > non-diagonal 
= BL, (B")’ 
=B'I,B")', if 2,=I, identity matrix 
==B"'(B")’ 
or 
 =Var(e,) =Var(B'é,) 
= BVar(é,(B)' 
= By ,(B")' 
=B'I,B'), if X,=I, identity matrix 
~=B'(B')' 


2.3 Stationarity 

Vector version of weak stationarity 
Mean 

E(x,)= 


where =(j/,,/U,,--.,//,) independent of t 


Variance-covariance 
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El(x,-4)(x,-)']=2, 


where X (K x K) independent of t 


Conditions for stationarity 


Consider univariate AR(p) (for illustration) 


y=OtPy,+..+@ y,,+u, 


=> Ly = 0+, 
Q(Z)=1-92—-...-@ 2" 


factorise 


= (1-42) x (1-4,z)x...x (1-4,Z) 


roots 


stationarity and stability requires inverse of roots of pth 


order polynomial to lie inside unit circle 


JA ,;< 1 (or |z,|>1), i=1,..., p 
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Infinite Moving Average (IMA) or Wold representation 


Let (L)' = f(L)= i. f L’ then 
y= @(L) (6 +u,) 
=> y=9)'5+D) fu, 


) : 
> y= SPiapey yp sbho| 
we 1-¢-...-¢, ae f t-j 


Bivariate VAR(1) 

X =A, +A,x,,+e, 
=> x,-A,x,,=A,te, 
=> x -A,Lx =A,+e, 
=> (,-A,L)x =A, +e, 

(1,-A,L)* ? 


pre-multiply by (I,-A,L)" 
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Determinantal equation 
det(I,—A, L) 
a @ —a,L)( —a,,L) =O, 15 


- | = (a,,+a,,)L a (a,,a,.—d,,q,,)L’ 


= (1-/,z)x(1-/,z) 


roots 


1 
Zi=—_, Z, 
aoe) 


stationarity and stability requires inverse of the roots of 2™ 


order polynomial to lie inside unit circle 
JA, \<1, |2,|< 1 (or |z,|>1, |z,|>1) 


If one of the two roots is one then both y,, z, are non- 


stationary 
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2.4 Identification 

Consider a bivariate structural VAR(1) 
y =b,,-b,, 247, Y..+7 Z..t€, 
Z.=b,,-b VAY YAY Zt, 


The structural system is not directly estimable by OLS 
since 


Cov(Z,,€,,) #0 and Cov(y,,é,,) #0 


implies biased and inconsistent estimates! 


Consider the reduced-form VAR(1) 


¥,=a,,+a,Y,,+a Z +ré,, 


1277 t-1 


Z.=d,,+d,y,,+d Z +re,, 


22°~ t-l 


here OLS is applicable. 


Recover all information present in structural model? 


Structural model is underidentified. Why? 
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However, if we set b, =0 => 
Y Dy dy Z AV VA Pat 
LDF VA fe, 


-| ) b,, 7 me ar ae é 
+ 
-|> 10.0 PY "Ty. Vn |L2e4 en 
alae ee Malia 
+ 
; ih be 
b,, I —D,, 
let B= => B= 
01 0 1 


Premultiply by B 


or 


=> 


Ji I or b,, I =D i Y 12 Y et l Dis Ey 
= + + 
z.}| {O 1 |b,} |O 1 fy, Yn |LZ4} [0 1 Le, 
or 
Je (Dig P5595) a Care ire PV Gt rp Voth ate we x 
Dig oy got ae y 
or 


¥=9,,t4a,y._ ta Z ea 


12°~ t-1 


Z=C, 10, V gran 2 re. 


22°” t-1 
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where a,,—b,,—b b a,=7,-b,, Y wp 6 eg Ue oe 


12 ~ 20? 
da,,=b,,, G=Y a,.=7 1 €,,=€,-b,,é,, e, =Eé 


t zt 


notice that y,is affected by both ¢,, and ¢,, whereas z, is 


affected only by ¢, = causal ordering 


Variance-covariance matrix of the errors 


Cis E(e,) E(@,e,, 
P= Ele.) =F le. ea)=| e) E(e; ' 


= be af b,,0; ) (30%; om | 


(=05 O;) ec 


Z 


Orthogonalize residuals using Choleski decomposition 
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2.5 Estimation 
Consider the reduced-form VAR(p) 


X =A, +A,X _+A,X,,+..+A x +e 
t t— t- p t-p t 


Under conditional normality, 


X |X jg X ng" NCA SAX AAX phat AX, 52) 


t-p? 


More compactly, let 


—_ ’ _ 
Z (Kp+1)x1_ ? 8 (Kx(Kp+1)) (A, A, rr. ) 


XxX 


The jth row of II’ have the parameters of the jth equation 
in the VAR. 


Thus, write 


X|X_ 55%,” N(IT'z,,2) 
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Derive the log-likelihood function 


LLF (0) = -log(22) — 108 | ~SY(x AZ E(x M2, 


The first order conditions (FOCs) 
rm =[yx2" [raz2" 


The jth row of II’ is 
wi = Dery, o 4ad Pelee 27 2 fe =1,...,K 


which amounts to equation-by-equation OLS. The MLE for 
eIror variance-covariance matrix 
< 1 TA Ar 
» Saas = —yéé . 
T 
A 2 MLE 1 


= >",é. for variances 


a. é,é, for covariances 
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2.6 Model selection criteria 


How do we choose lag order? 


1 Way 
Adding lags reduces the determinant of the variance- 


covariance matrix of the reduced-form errors | =|, but also 


leads to loss of degrees of freedom (df) 


Model selection criteria trade off reduction of |= | fora 


more parsimonious model 


Akaike Information Criterion (AIC) 
AIC =log| =| +EN 


Schwarz Bayesian Criterion (SBC) 


SBC =log|2|+—8n 


where N —> total number of estimated parameters 


N=K(p+1), and T (fixed) 
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Minimization 


SBC marginal cost of adding regressors greater than AIC 


2 Way 
Conduct a series of Likelihood Ratio (LR) tests. 
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2.7 Impulse response analysis 


VAR models concentrate on shocks. First the relevant 
shocks are identified, and the response of the system to 
shocks is described by analysis impulse responses (the 
propagation mechanism of the shocks). 


Consider bivariate reduced-form VAR(1) 
XxX =A,X,_,+e, 


backward iteration implies 
xX, =A (A,x,_,+e,,) +e, 
after n iterations 
=>y,Ae,tAx 


as N—-> 


n+1 


xX =>7,Ae,,, since Ax, 0 


Infinite Moving Average (IMA) or Wold representation 
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= : - bv, 0,0) © i | iy al 
Z, ad» d,, —bx 1 é ei 
if we let ¢=[A' /(1-b,,b,,)] a 
lt we 1e ~= A ( 12 n) Shes 1 
Structural Infinite Moving Average (IMA) representation 
id ie ee os Oa 
Z|} ~~ (d.) b.() Le. 
is : ioe (0) eH ee b,(1) ac 
Z| | Gx(0) $10) Lex} L¢x@ OQ Léa 
Y =P, Dé, FP ADE AP DE yt PoDé .t-. 
2 =P r{(D)E y+ Po O)E g+>Pr DE yp BoD git 
Impulses hitting the system 


em ?,(9) 
? (0) %,,(0) 


¢,(0) — instantaneous effect of ¢, on y, 


b matrix of impact multipliers 
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¢,(0) — instantaneous effect of ¢,, on Z, 


~,(O) — instantaneous effect of ¢, on y, 


¢,(0) — instantaneous effect of €, on Z, 


ie ot 
—> 
?,() 9.) 


@,,(1)— 1-period effect of ¢,,on y, 
@,(1) > 1-period effect of ¢, , on y, 
,,(1) — 1-period effect of ¢,, on z 


t 


~,(1) — 1-period effect of ¢,, on z 


t 


?,,(n) > n-period effect of €,, on y, (or €,, on Y,,,) 
>. Pili) > accumulated effect of €, on {y,} after n 


periods 


?,(n) — n-period effect of ¢,., on Z, (or €,, on Z,,,,) 
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>, Pali) > accumulated effect of ¢,, on {z,} after n 


periods 


Therefore 


dio , (i) —> long-run multipliers 
@ , (i) versus i + impulse response functions 


limg ,(i) =0, j,k =1,2 


No structural shock should have long-run impact. If the 


variables are stationary then shocks have transitory effects. 


In his famous article Sims (1980) proposed the following 
identification strategy. To identify the shocks use Choleski 


decomposition in the structural model, b,,=0 => 


=> @,, =€,-b,€,, Cy, =E 


12 7 zt? 
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Structural Infinite Moving Average (IMA) representation 


-_ | = ae bg a1» | : On hal 
a G5, Ay 0 1 Eni 


if 1 e . 1 =Dig 
if we let d= A, 
01 


ba = be id ba 

z.| ~~ |¢,@ ¢,(i)]Lé,., 

I _ in ole |e oO [els 
Z, 0 4,0) ex} [GM dole] 


Y=9, (He, +¢,,(0eé,+¢, De, +O.(De,,+-- 
Z,=9 (Dé +P DE yt PDE yt. 


Asymmetry — Z, prior to y, (causal ordering) 


Example of calculation of impulse response functions 
(IFRs) 
e Set xX, ,=...=x,_,=0 


{—p 


e Set ¢,=1 and €,=0 for k # j 


e Simulate the system for dates t, +1, t+2,...,t+n 


Assume VAR(1) 
y,=0.7y_,40.2z,.+€,, 
Z=0.2y,_,+0.7z,,+€,, 


where d,, =a,,=0 (for simplicity) and the reduced-form 
errors are given 


e,=é,+0.8¢,, b,,=—0.8 


zt? 


Coe x 


Asymmetry — Z, prior to y, (causal ordering) 
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At period t 
set €,=1, ¢,=0 and y,,=z,_,=0 
=> 
e =0+0.8(1) =0.8 
e,=1 
y,=0.7(0) + 0.2(0) + 0.8 = 0.8 
z =0.2(0)+0.7(0)+1=1 


At period t+1 


set Eo 0, € at 0 


> 
y,,= 0.7(0.8) + 0.2(1) + 0 = 0.76 
(Y,,= 0.7y,+0.2z,) 
Z_,= 0.2(0.8) + 0.7(1) + 0 = 0.86 
(Z,.,;= 0.2y,+0.72z, ) 
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At period t+ 2 


set €,=0, ¢€ 


zt+2 ytt2 


=> 
y,,,= 0.7(0.76) + 0.2(0.86) + 0 = 0.704 
(Vino= 0.7Y,,,+0.22Z,,1) 
Z_,= 0.2(0.76) + 0.7(0.86) + 0 = 0.754 
(Z..9= 0.2y,,,+0.7Z,,,) 


At period t+3 
set €,=0, €,=0 


=> 


y,.,= 0.7(0.704) + 0.2(0.754) + 0 = 0.6436 


(Visg= 0.7Y,.+0.2Z,,5) 
z= 0.2(0.704) + 0.7(0.754) + 0 = 0.6686 


(Z..3> O.2y,,.+0.7Z,,5) 
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At period t+ 4 
set ne 0, GO ige 0 


=> 


y,.,= 0.7(0.6436) + 0.2(0.6686) + 0 = 0.584 


(Visa= 0.7Y,,3;+0.2Z,,3) 
z= 0.2(0.6436) + 0.7(0.6686) + 0 = 0.597 


(Z,.4= 0.7Y,,3;+0.2Z,,3) 


Stationarity assures the impulse responses ultimately decay 
limy, = 0 


limz_=0 
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Similarly, a shock on the other variable 
At period t 
set €,=1, €,=0 and y, =z,,=0 
=> 
e =1+0.8(0) =1 


e,=0 


y,=0.7(0) +0.2(0)+1=1 
z =0.2(0)+0.7(0) +0=0 


At period t+1 
0,¢€,=0 


yttl zt+1 


set € 

=> 
y,.,.= 0.7() + 0.2(0) + 0 = 0.7 
(Y..= 0.7y,+0.2z,) 
Z_,=9.2(1) + 0.7(0) + 0 = 0.2 
(Z,.,= 0.7y,40.2z,) 
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At period t+ 2 


0, ¢€ 0 


set € 


ytt2 zt+2 


=> 


Y.9= 0.7(0.7) + 0.2(0.2) + 0 = 0.53 


(Viso= 0.7Y,,,+0.2Z,,,) 
z,,= 0.2(0.7) +0.7(0.2) + 0 = 0.28 
(Z,.5= 0.7Y,,,+0.2Z,,,) 


At period t+3 
=0,¢,=0 


set © 43 3 
=> 


y,,,= 0.7(0.53) + 0.2(0.28) + 0 = 0.43 


(Visg= 0.7Y,.+0.2Z,,5) 
z_.=0.2(0.53) +0.7(0.28)+0=0.3 


3 


(Z..3= 0.7Y,,.+0.2Z,,,) 
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At period t+ 4 


set oo 0, Coo 0 
=> 


y,,= 0.7(0.43) + 0.2(0.30) + 0 = 0.36 


(Visa= 0.7Y,,3;+0.2Z,,3) 
z_,=0.2(0.43) +0.7(0.30)+0=0.3 


(Z,,4= 9.7Y,,,+0.2Z,,5) 


t+4 


At period t+5 
=0,¢,=0 


set E vias : 
=> 


y,,.= 0.7(0.36) + 0.2(0.30) + 0 =0.31 


(Viss= 0.7Y,,4+0.2Z,,4) 
z_.=0.2(0.36) +0.7(0.30) +0 =0.28 


5 


(Z,,,= 0.7y,,,+0.2z,,,) 
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At period t+6 
=0, ¢,,=0 


set © de 6 
=> 


y,,-= 0.7(0.31) + 0.2(0.28) +0 =0.25 


(Visg= 0.7Y,,;+0.22Z,,5) 


z_ =0.2(0.31) +0.7(0.28) +0 =0.26 


6 


(Visg= 0.7Y,,;+0.22Z,,5) 


Stationarity assures the impulse responses ultimately decay 
limy = 0 


limz, = 0 
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2.8 Impulse response analysis: Sensitivity analysis 
Does the assumed causal ordering affect the structural 
inferences? 

If & close to diagonal > B close to diagonal (identity) 
= the ordering does not matter 


= the importance of ordering depends on 


— Cov(e.,,€,, = O15 
/Var(e,, Var(e,,) 07.0, 


H,: & diagonal 


Z Cov(é,,,é,, _ 6 


P* War é,)Var(é,) 6,6 


LM =Tp ~ 7? 
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2.9 Examples from macro VARs 

The VAR models of the monetary transmission mechanism 
are not estimated to give advice on the best monetary 
policy. Rather they are estimated to provide empirical 
evidence on the response of macroeconomic variables to 


monetary policy impulses. 


It is interesting to see how the specification of the standard 
VAR model has developed over time. Initially models were 
estimated on a rather limited set of variables, i.e. prices, 
output (real activity) and money (monetary policy). The 


underlying structural model is specified as follows 
Pap Paty Wet a ay 
Y =D) BPAY Prat VW at VAM até y 
m,=b,.—b3,),—DaY AY sP atl atl sIN a t€ mt 


Pp, contemporaneous independent of y,,m, 
y, contemporaneous independent of m, 


This is a just-identification scheme, where the identification 
of structural shocks depends on the ordering of variables. It 
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corresponds to a recursive economic structure, with the 
most endogenous variable ordered last. 


Causal ordering > p,—> y, > m, 


Intuitively, inflation shock (supply shock) > output > 


monetary policy 


OT 
Bx,=T +1 x,,+€,, €~ WN(0,2,) 
where 
1 0 O P, b,, 
B=), Dol -O: | X=). |. Po=1 Dag. ls 
bs1 bs2 1 Me bsp 
Yu Yr Vig 6 
Pt 
r= Yu Yo 23 | OF ey, 
En 
V3 Vs 133 


Identification is Choleski-type with money ordered last. 


This VAR model can be extended to include short-term 


interest rates just before money as a penultimate variable in 
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the Choleski identification. The idea is to see the robustness 
of the above results after identifying the part of money, 
which is endogenous to the interest rate. More specifically, 


the underlying structural model is specified as follows 
Paige ay eat ee 

V HPO BPP att Dat at ag ey 
i,=b,.—b3,P,—baY AY sP atl atl ed atl sal té i 


M=Dyp—D P-gp igh Dia oy a a Nae 


Pp, contemporaneous independent of y,,i,,m, 
y, contemporaneous independent of i, ,m, 
1, contemporaneous independent of m, 


Causal ordering > p, > y, > 1, > m, 
or 


Bx,=T)+TX,,+€,, €~ WN(0,2,) 


where 
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1 0 00 


P, b,, 
ee bul O O gel in| 20 
’ ool ees ’ 0 ’ 
bz: b32 1 0 t, bso 
b 

ba ba 5,1 a a 
Yu Yr Via Via E,, 
f= Yu Vo VYo3 Vr ox ey, 
é. 
Ya Y32 V33 Va : 


Ya Y a2 Y 03 Y 44 


Some evidence from the literature 

After a contractionary monetary policy shocks, plausible 
models of the monetary transmission mechanism should be 
consistent at least with the following evidence on price, 
output and interest rates: (i) price level initially responds 
very little, (ii) interest rates initially rise, and (iii) output 
initially falls , with a j-shaped response, with a zero long- 


run effect of the monetary impulse. 
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Having identified the ‘monetary rule’ by proposing an 
explicit solution to the problem of the endogeneity of 
money, the VAR method focuses on deviation from the 
rule. Deviations from the rule are obtained either by 
changing the systematic component of monetary policy or 
by considering exogenous shocks, which leave monetary 
policy unaltered. In the former case the deviation from the 
rule is obtained by changing some parameters in the B 
matrix describing the simultaneous relations among 
variables, while in the latter case the parameters of the 
matrix B are not changed. Consider for example the case of 
interest rate targeting. The first type of deviations is 
obtained by modifying the response of the Central Bank’s 
interest rate to macroeconomic conditions (fluctuations in 
output and prices), while the second type of deviations is 
obtained by considering an exogenous shock which does 
not change the response of the monetary policy-maker to 
macroeconomic conditions. VAR modeling has focused on 
simulating shocks, leaving the systematic component of 


monetary policy unchanged. 
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Focusing on the shocks is important since only when the 
Central Bank deviates from its rules it becomes possible to 
collect interesting information on the response of 
macroeconomic variables to monetary policy impulses 
(shocks)--the best opportunity to detect the response of 
macroeconomic variables to monetary policy impulses 


unexpected by the market. 


Often there are difficulties with interpreting shocks to 
interest rates as monetary policy shocks. The response of 
prices to an innovation (error) in interest rates gives rise to 
the ‘price puzzle’—prices increase significantly after an 
interest rate hike. The ‘price puzzle’ may be due to mis- 
specification of the VAR model. Suppose monetary policy 
reacts to expected inflation, then we have an omitted 
variable from the VAR positively related to inflation and 
interest rates. Such omission makes the VAR mis-specified 
and (partly?) explains the positive relation prices and 


interest rates observed in the impulse response functions. 
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2.10 Forecasting 
Consider 


VAR(1): x, =A,+A,xX,,+e, 


The model 1-period ahead 
x =A, +Ax,+e,, 
Produce 1-period ahead forecast 
x" =A, +AXx ; since the forecast for e. is (on 


average) zero 


The model 2-periods ahead 
Cree 
Produce 2-periods ahead forecast 
x! =A +Ax! since the forecast for e,, is (on 


1° t4+19 


average) zero 


The model 3-periods ahead 


X, =A, tAX,,+e.. 


1 t+2 
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Produce 3-periods ahead forecast 


oy, eee 
Xe =A, AA 3 


since the forecast for e, is zero 


Consider 


VAR(3): xX, =A,+A,xX_,+A,x_,+A,xX,,+e, 


The model 1-period ahead 
X =A, +AX, A, x +A, xX oe 
Produce 1-period ahead forecast 
x! =A+AXx+A,x_+A,x,,, since the forecast for e., 


is (on average) zero 


The model 2-periods ahead 
x,,=A, +Ax+A, x+A,x,+€,,, 
Produce 2-periods ahead forecast 


xX! =A +AXx', +A.x+A,x 


3° t-1? 


since the forecast for e., 


is (on average) zero 


The model 3-periods ahead 
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X =A, PAK; +A, Xs +A,X,+€,,, 
Produce 3-periods ahead forecast 
x! =A,+Ax!',+A,x' +A,x,, since the forecast for e.,, 


is zero 


Forecast uncertainty 


In large samples, x’ ~ N(x, ,Var(x'_)). We can constrict 


confidence intervals 
x! +1.96* se(x! ) 


-- An example using Gretl -- 


The iterated forecast method versus the multiperiod 
forecast method 

So far, we looked at the iterated forecast method. Another 
way to obtain forecast is by using the multiperiod forecast 


method. 


Multiperiod forecasts 
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Consider 


VAR(3): X,=A,+AX_+A,x_,+A,x_,+e, 


The model 1-period ahead 

X =A, +A,x,+A, pa ge ae ae 
Produce 1-period ahead forecast 

x! =A,+A,x,+A,x +A,x,,, since the forecast for e 


29 


is (on average) zero 


The model 2-periods ahead 
X =A, +A Xx +A,x +A,xX,,+e,, 
Produce 2-periods ahead forecast 
x', =A, +A,x +A,x_+A,x,_,, since the forecast for e 


t-29 2 
is (on average) zero 
The model 3-periods ahead 
x =A +Ax +A,.x+Ax +e, 
Produce 3-periods ahead forecast 
x! =A +Ax +A.x,+A,x_,, since the forecast for e., 


t-29 


is zero 
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If the model is correctly specified, the iterated method is 
more precise. Iterating can lead to biased forecasts. 


Otherwise, the multiperiod forecast method is preferred. 
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This book treats econometric methods for 
analysis of applied econometrics with a particular 
focus on applications in macroeconomics. Topics 
include macroeconomic data, panel data models, 
unobserved heterogeneity, model comparison, 
endogeneity, dynamic econometric models, vector 
autoregressions, forecast evaluation, structural 
identification. The books provides undergraduate 
students with the necessary knowledge to be able 
to undertake econometric analysis in modern 


macroeconomic research. 


